Image processing device and method

ABSTRACT

An image processing apparatus and method which can achieve a reduction in size of the crossbar circuit and achieve a higher speed of processing, which perform DDA processing (ST 11 ), then read out texture data from a memory (ST 12 ), perform sub-word reallocation processing (ST 13 ), then perform texture-filtering (ST 14 ), then globally distribute data by the crossbar circuit  13  to a first operation processing element of each processing module (ST 15 ), then perform processing at the pixel level, specifically use the texture data after filtering and the various types of data after rasterization to perform operations by pixel units and draw the pixel data passing the various types of tests in the processing at the pixel level to a frame buffer on a memory module (ST 16 ).

TECHNICAL FIELD

[0001] The present invention relates to an image processing apparatusand method thereof wherein a plurality of processing devices shareprocessing data to perform parallel processing.

BACKGROUND ART

[0002] In recent years, graphic LSIs for performing 3D computer graphicsby hardware at a high speed have spread remarkably. In particular, ingame systems and personal computers (PC), such graphic LSIs are oftenmounted as standard equipment.

[0003] Further, the technological advances being made in graphic LSIshave been fast. Expansion of functions such as in the “Vertex Shader”and “Pixel Shader” employed in “DirectX” have been continuing andperformance has been improved at a pace faster than that of the CPUs.

[0004] In order to improve the performance of a graphic LSI, it iseffective not only to raise the operating frequency of the LSI, but alsoto utilize the techniques of parallel processing. The techniques ofparallel processing may be roughly classified as follows.

[0005] First is a parallel processing method by area division, second isa parallel processing method at a primitive level, and third is aparallel processing method at a pixel level.

[0006] The above classification is based on the particle size of theparallel processing. The particle size of the area division parallelprocessing is the roughest, while the particle size of the pixel levelparallel processing is the finest. These techniques will be summarizedbelow.

[0007] Parallel Processing by Area Division

[0008] This is a technique dividing a screen into a plurality ofrectangular areas and performing parallel processing while assigningareas to a plurality of processing units to take charge of.

[0009] Parallel Processing at Primitive Level

[0010] This is a technique of imparting different primitives (forexample triangles) to a plurality of processing units and making themoperate in parallel.

[0011] A view conceptually showing parallel processing at the primitivelevel is shown in FIG. 1.

[0012] In FIG. 1, PM0 to PMn−1 indicate different primitives, PU0 toPUn−1 indicate processing units, and MM0 to MMn−1 indicate memorymodules.

[0013] When primitives PM0 to PMn−1 having a relatively equal large sizeare given to the processing units PU0 to PUn−1, the loads on theprocessing units PU0 to PUn−1 are balanced and efficient parallelprocessing can be carried out.

[0014] Parallel Processing at Pixel Level

[0015] This is the technique of parallel processing of the finestparticle size.

[0016]FIG. 2 is a view conceptually showing parallel processing at theprimitive level based on the technique of parallel processings at thepixel level.

[0017] As shown in FIG. 2, in the technique of parallel processing atthe pixel level, when rasterizing triangles, pixels are generated inunits of rectangular areas referred to as “pixel stamps PS” comprised ofpixels arrayed in a 2×8 matrix.

[0018] In the example of FIG. 2, a total of eight pixel stamps from thepixel stamp PS0 to the pixel stamp PS7 are generated. A maximum of 16pixels included in these pixel stamps PS0 to PS7 are simultaneouslyprocessed.

[0019] This technique has an efficiency in parallel processing better bythe amount of fineness of the particle size in comparison with the othertechniques.

[0020] In the case of parallel processing by area division explainedabove, however, in order to make the processing units operate inparallel efficiently, it is necessary to classify objects to be drawn inthe areas in advance, so the load of the scene data analysis is heavy.

[0021] Further, when not starting drawing after one frame's worth of thescene data is all present, but drawing in the so-called immediate modeof starting drawing immediately when object data is given, the parallelproperty cannot be achieved.

[0022] Further, in the case of parallel processing at the primitivelevel, in actuality, there is variation in sizes of the primitives PM0to PMn−1 composing the object, so a difference arises in the time forprocessing one primitive among the processing units PU0 to PUn−1. Whenthis difference becomes large, the areas which the processing units drawin also largely differ and the locality of the data is lost, so forexample the DRAM comprising the memory modules frequently makes pageerrors and the performance is lowered.

[0023] Further, in the case of this technique, there is also the problemof a high interconnect cost. In general, in hardware for graphicsprocessing, in order to broaden the band width of the memory, aplurality of memory modules is used for memory interleaving.

[0024] At this time, as shown in FIG. 1, it is necessary to connect allprocessing units PU0 to PUn−1 and the built-in memory modules MM0 toMMn.

[0025] On the other hand, in the case of the parallel processing at thepixel level, as described above, there is the advantage that theefficiency of the parallel processing becomes better by the amount offineness of the particle size. As the processing including actualfiltering, processing is performed by the routine shown in FIG. 3.

[0026] That is, DDA (digital differential analyzer) parameters, forexample, the inclinations of various types of data (Z, texturecoordinates, colors, etc.) required for rasterization and other DDAparameters are calculated (ST1).

[0027] Next, texture data is read out from a memory (ST2), sub-wordreallocation is performed (ST3), then the data is globally distributedto the processing units by a crossbar circuit (ST4).

[0028] Next, texture filtering is performed (ST5). In this case, theprocessing units PU0 to PU3 perform four-neighbor interpolation or otherfiltering by using read texture data and a decimal fraction obtainedwhen calculating a (u, v) address.

[0029] Next, processing at the pixel level (per-pixel operation) isperformed, specifically, texture data after the filtering and varioustypes of data after rasterization are used for operations on pixel units(ST5).

[0030] Further, pixel data passing various types of tests in theprocessing at the pixel level is drawn in a frame buffer and Z-buffer onthe memory modules MM0 to MM3.

[0031] By the way, memory access of the texture read system differs frommemory access of the graphics generation system, therefore it isnecessary to read data from a memory belonging to another module.

[0032] Therefore, for memory access of the texture read system, aninterconnect such as a crossbar circuit as described above is necessary.

[0033] However, a related image processing apparatus, as describedabove, globally distributes data to the processing units, then performstexture filtering, so there are the disadvantages that the amount ofdata globally distributed is large (for example, 4 Tbps), the crossbarcircuit serving as the global bus becomes large in size, and an increasein the speed of processing is obstructed from the viewpoint ofinterconnect delay etc.

DISCLOSURE OF THE INVENTION

[0034] An object of the present invention is to provide an imageprocessing apparatus and method thereof that can achieve a reduction inthe size of a crossbar circuit and achieve an increase in speed ofprocessing.

[0035] To achieve the above object, a first aspect of the presentinvention is an image processing apparatus in which a plurality ofprocessing modules share processing data to perform parallel processing,wherein: the plurality of processing modules each comprise: a memorymodule storing at least data relating to filtering, a processing circuitfor obtaining data for filtering and performing assigned processingdetermined in advance by corresponding memory interleaving based on theprocessing data, a first operation processing element for performingoperation processing in pixel units based on assigned processing dataand data after filtering obtained at said processing circuit, and asecond operation processing element for performing filtering based onthe data for filtering obtained by said processing circuit and the datarelating to filtering stored in said memory module and receivingoperation processing data from the first operation processing element,then drawing the operation processed data to the memory module andfurther comprising a crossbar circuit which is a global bus forconnecting a plurality of first operation processing elements and aplurality of second operation processing elements of said processingmodules, supplying data for filtering obtained by said processingcircuit in each processing module to a second operation processingelement in the same processing module, supplying data after filteringfrom a second operation processing element in each processing module toa first operation processing element in a processing modulecorresponding to the processing, and supplying the operation processingdata from the first operation processing element to the second operationprocessing element.

[0036] In the first aspect, the processing circuit of each processingmodule comprises a means for adjusting time so that the processing timeof the assigned data becomes equal to time when the data after filteringis supplied to the first operation processing element.

[0037] A second aspect of the present invention is an image processingmethod in which a plurality of processing modules share processing datato perform parallel processing: obtaining data for filtering andperforming assigned process determined in advance by correspondingmemory interleaving based on the processing data in each processingmodule; performing filtering based on the obtained data for filteringand data relating to filtering stored in a memory module; supplying dataafter filtering in each processing module, through a global bus, to apredetermined processing module; and performing operation processing inpixel unit based on the obtained assigned processing data and the dataafter filtering, and drawing the operation processed data to said memorymodule, in a processing module receiving the data after filtering.

[0038] Preferably, the method further comprises a step of adjusting eachprocessing module time so that processing time of the assigned databecomes equal to time when the data after filtering is supplied.

[0039] Moreover, in the present invention, the processing requiring thefiltering is processing relating to texture.

[0040] Further, the above parallel processing is parallel processing atthe pixel level.

[0041] According to the present invention, for example the setup circuitperforms an operation on the vertex data, sets up a primitive, andoutputs the assigned texture's worth of setup information to theprocessing modules.

[0042] The processing circuit of each processing module calculates forexample the DDA parameters, specifically the inclination etc. of varioustypes of data (Z, texture coordinates, colors, etc.) and other DDAparameters required for rasterization based on information from thesetup circuit.

[0043] Further, each processing circuit judges whether for example atriangle is its assigned area based on the parameter data and, when itsassigned area, performs rasterization.

[0044] Moreover, each processing circuit calculates the MipMap level bycalculating the LOD and calculates the (u, v) address calculation fortexture access.

[0045] Then, each processing circuit outputs the obtained texturecoordinates and address information for texture access etc. to thesecond operation processing element.

[0046] On the other hand, each processing circuit supplies the color andother information other than the texture obtained to the first operationprocessing element.

[0047] Further, the second operation processing element of eachprocessing module receives coordinate data and address data relating totexture supplied from the processing circuit, reads out the texture datafrom memory module, and performs four-neighbor interpolation or othertexture filtering using the read texture data and a decimal fractionobtained at the time of calculation of the (u, v) address.

[0048] The texture data after filtering from a second operationprocessing element is supplied via the crossbar circuit to for examplethe first operation processing element of the processing module having aframe buffer corresponding to a stamp.

[0049] The first operation processing element of a processing moduleperforms processing at the pixel level based on data other than thetexture information supplied from the processing circuit and data aftertexture filtering by the second operation processing element of aprocessing module received through the crossbar circuit and outputs theresult to the second operation processing element.

[0050] Then, the second operation processing element receives the resultof processing at the pixel level supplied from the first operationprocessing element and draws the pixel data passing various types oftests in the processing at the pixel level in the memory module.

[0051] The above processing is performed in parallel in the modules.

BRIEF DESCRIPTION OF THE DRAWINGS

[0052]FIG. 1 is a view showing conceptually parallel processing at aprimitive level.

[0053]FIG. 2 is a view showing conceptually parallel processing at aprimitive level based on the technique of parallel processing at thepixel level.

[0054]FIG. 3 is a view explaining the processing routine includingtexture filtering of a related image processing apparatus.

[0055]FIG. 4 is a block diagram of the configuration of an embodiment ofan image processing apparatus according to the present invention.

[0056]FIG. 5 is a view of the basic architecture and processing flow ofan image processing apparatus according to the present embodiment.

[0057]FIG. 6 is a view of an example of the configuration of key partsof a DDA circuit according to the present embodiment.

[0058]FIG. 7 is a view of a concrete example of the configuration of acrossbar circuit according to the present embodiment.

[0059]FIG. 8 is a view describing conceptually the processing of theimage processing apparatus according to the present embodiment.

[0060]FIG. 9 is a view of the conceptual processing flow of an imageprocessing apparatus according to the present embodiment.

BEST MODE FOR CARRYING OUT THE INVENTION

[0061]FIG. 4 is a block diagram of the configuration of an embodiment ofan image processing apparatus according to the present invention.

[0062] An image processing apparatus 10 according to the presentembodiment has, as shown in FIG. 4, a setup circuit 11, processingmodules 12-0 to 12-3, and a crossbar circuit 13.

[0063] In the present image processing apparatus 10, a plurality of,four in the present embodiment, processing modules 12-0 to 12-3 areconnected in parallel to a setup circuit 11. A plurality of processingmodules 12-0 to 12-3 share processing data to perform parallelprocessing.

[0064] Then, for the texture read system, memory access to otherprocessing modules is necessary, but a crossbar circuit 13 serving as aglobal access bus is used for this access.

[0065] Below, the configurations and functions of the components aredescribed sequentially with reference to the drawings.

[0066] The setup circuit 11 controls the transfer of data with a CPU andan external memory and the transfer of data with the processing modules12-0 to 12-3, performs operations on the vertex data, sets up oneprimitive, and outputs the assigned texture's worth of setup informationto the processing modules 12-0 to 12-3.

[0067] Specifically, the setup circuit 11 performs a per-vertexoperation when data is input.

[0068] In this processing, when vertex data of the three-dimensionalcoordinates, normal vector, and a texture coordinates are input, anoperation is performed on the vertex data. As representative operations,there are operation processing for coordinate conversion such asdeformation of an object, projection on a screen, etc., operationprocessing for lighting, and operation processing for clipping.

[0069] The processing module 12-0 has a DDA (digital differentialanalyzer) circuit 121-0 as a processing circuit, a first operationprocessing element (operation processing element 1) 122-0, a secondoperation processing element (operation processing element 2) 123-0, anda memory module (MEM) 124-0 formed by a DRAM for example.

[0070] Similarly, the processing module 12-1 has a DDA circuit 121-1 asa processing circuit, a first operation processing element (operationprocessing element 1) 122-1, a second operation processing element(operation processing element 2) 123-1, and a memory module (MEM) 124-1formed by a DRAM for example.

[0071] The processing module 12-2 has a DDA circuit 121-2 as aprocessing circuit, a first operation processing element (operationprocessing element 1) 122-2, a second operation processing element(operation processing element 2) 123-2, and a memory module (MEM) 124-2formed by a DRAM for example.

[0072] The processing module 12-3 has a DDA circuit 121-3 as aprocessing circuit, a first operation processing element (operationprocessing element 1) 122-3, a second operation processing element(operation processing element 2) 123-3, and a memory module (MEM) 124-3formed by a DRAM for example.

[0073] Then, the first operation processing elements 122-0 to 122-3 andthe second operation processing elements 123-0 to 123-3 in theprocessing modules 12-0 to 12-3, as described in detail later, areconnected to each other via the crossbar circuit 13.

[0074]FIG. 5 is a view of the basic architecture and a processing flowof an image processing apparatus according to the present embodiment.Note that in FIG. 5, arrows with circles indicate the flow of datarelating to texture and arrows without circles indicate the flow of datarelating to pixels.

[0075] In the present embodiment, in the processing modules 12-0 to12-3, the memory modules 124-0 to 124-3 are interleaved to predeterminedsizes, for example, 4×4 rectangle area units.

[0076] Specifically, as shown in FIG. 5, a so-called frame buffer isinterleaved to all memory modules and texture memories are dispersed inthe memory modules 124-0 to 124-3.

[0077] The DDA circuit 121-0 in the processing module 12-0 calculatesDDA parameters based on information from the setup circuit 11.

[0078] In this processing, the inclinations of various types of data (Z,texture coordinates, colors, etc.) required for rasterization and otherDDA parameters are calculated.

[0079] Also, the DDA circuit 121-0 judges, based on the parameter data,whether a triangle for example is in an assigned area for the circuit toprocess and, when in the area, performs rasterization.

[0080] Specifically, it judges whether the triangle belongs to areaassigned to it, for example, an area interleaved by 4×4 pixel rectanglearea units, and, when belonging to it, rasterizes various types of data(Z, texture coordinates, colors, etc.). In this case, the generated unitis 2×2 pixels per cycle per local module.

[0081] Next, the DDA circuit 121-0 corrects the perspective of thetexture coordinates. This processing stage includes calculation of theMipMap level by calculation of the LOD (level of detail) and calculationof the (u, v) address for texture access.

[0082] The DDA circuit 121-0, as shown in FIG. 6 for example, performstexture processing on the texture coordinates by a texture system DDAportion 1211, address information for texture access, etc. and outputsinformation relating to texture to the second operation processingelement 123-0 via the first operation processing element 122-0 and thecrossbar circuit 13.

[0083] On the other hand, the DDA circuit 121-0 performs color and otherprocessing other than texture by another DDA portion 1212 and outputsthe result to the first operation processing element 122-0.

[0084] In the present embodiment, each DDA circuit 121 (−0 to 3) isprovided with a FIFO (first-in first-out) only at a data input side ofthe other DDA portion 1212 and adjusts the time considering the time ofthe filtering of the texture system.

[0085] Moreover, the texture system DDA portion 1211 generates data ofthe texture assigned to it for all pixels, while the other DDA portion1212 generates only portions assigned by memory interleaving.

[0086] The first operation processing element 122-0 performs processingat the pixel level (per-pixel operation) based on data other than thetexture information supplied from the DDA circuit 121-0 and data aftertexture filtering by the second operation processing elements 123-0 to123-3 of the processing modules 12-0 to 12-3 received through thecrossbar circuit 13 and outputs the results to the second operationprocessing element 123-0 via the crossbar circuit 13.

[0087] In the processing at the pixel level, the texture data afterfiltering and the various types of data after rasterization are used foroperations on pixel units. The processing performed here corresponds topixel-level lighting (per-pixel lighting) or other so-called pixelshader processing.

[0088] The second operation processing element 123-0 receives coordinatedata and address data relating to texture supplied from the DDA circuit121-0, reads out the texture data from the memory module 124-0, performstexture filtering, and outputs the texture data after the filtering viathe crossbar circuit 13 to either of the first operation processingelements 122-0 to 122-3 in processing modules having a frame buffercorresponding to a stamp.

[0089] In this case, the second operation processing element 123-0performs four-neighbor interpolation or other filtering by using theread texture data and a decimal fraction obtained at the time ofcalculation of the (u, v) address.

[0090] Moreover, the second operation processing element 123-0 receivesa processing result of the pixel level supplied from the first operationprocessing element 122-0 and draws the pixel data passing various testsin the processing at the pixel level to the memory module 124-0.

[0091] The DDA circuit 121-1 in the processing module 12-1 calculatesDDA parameters, specifically inclination of various types of data (Z,texture coordinates, colors, etc.) and other DDA parameters necessaryfor rasterization based on information from the setup circuit 11.

[0092] Also, the DDA circuit 121-1 judges, based on the parameter data,whether a triangle for example is its assigned area and, when theassigned area, performs rasterization.

[0093] Specifically, it judges whether the triangle is its assignedarea, for example, whether it belongs in an area interleaved in 4×4pixel rectangle area units and, when belonging, rasterizes various typesof data (Z, texture coordinates, colors, etc.). In this case, thegenerated unit is 2×2 pixels per cycle per local module.

[0094] Next, the DDA circuit 121-1 corrects the perspective of thetexture coordinates. This processing stage includes calculation of theMipMap level by calculation of the LOD (level of detail) and calculationof the (u, v) address for texture access.

[0095] The DDA circuit 121-1, as shown in FIG. 6 for example, performstexture processing on the texture coordinates by a texture system DDAportion 1211, address information for texture access, etc. and outputsinformation relating to texture to the second operation processingelement 123-1 via the first operation processing element 122-1 and thecrossbar circuit 13.

[0096] On the other hand, the DDA circuit 121-1 performs color and otherprocessing other than texture by another DDA portion 1212 and outputsthe result to the first operation processing element 122-1.

[0097] The first operation processing element 122-1 performs processingat the pixel level (per-pixel operation) based on data other than thetexture information supplied from the DDA circuit 121-1 and data aftertexture filtering by the second operation processing elements 123-0 to123-3 of the processing modules 12-0 to 12-3 received through thecrossbar circuit 13 and outputs the results to the second operationprocessing element 123-1 via the crossbar circuit 13.

[0098] In the processing at the pixel level, the texture data afterfiltering and the various types of data after rasterization are used foroperations on pixel units. The processing performed here corresponds topixel-level lighting or other so-called pixel shader processing.

[0099] The second operation processing element 123-1 receives coordinatedata and address data relating to texture supplied from the DDA circuit121-1, reads out the texture data from the memory module 124-1, performstexture filtering, and outputs the texture data after the filtering viathe crossbar circuit 13 to either of the first operation processingelements 122-0 to 122-3 in processing modules having a frame buffercorresponding to a stamp.

[0100] In this case, the second operation processing element 123-1performs four-neighbor interpolation or other filtering by using theread texture data and a decimal fraction obtained at the time ofcalculation of the (u, v) address.

[0101] Moreover, the second operation processing element 123-1 receivesprocessing results of the pixel level supplied from the first operationprocessing element 122-1 and draws the pixel data passing various testsin the processing at the pixel level to the memory module 124-1.

[0102] The DDA circuit 121-2 in the processing module 12-2 calculatesDDA parameters, specifically inclination of various types of data (Z,texture coordinate, colors, etc.) or other DDA parameters necessary forrasterization based on information from the setup circuit 11.

[0103] Also, the DDA circuit 121-2 judges, based on the parameter data,whether a triangle for example is its assigned area and, when theassigned area, performs rasterization.

[0104] Specifically, it judges whether the triangle is its assignedarea, for example, whether it belongs in an area interleaved in 4×4pixel rectangle area units and, when belonging, rasterizes various typesof data (Z, texture coordinates, colors, etc.). In this case, thegenerated unit is 2×2 pixels per cycle per local module.

[0105] Next, the DDA circuit 121-2 corrects the perspective of thetexture coordinates. This processing stage includes calculation of theMipMap level by calculation of the LOD (level of detail) and calculationof the (u, v) address for texture access.

[0106] The DDA circuit 121-2, as shown in FIG. 6 for example, performstexture processing on the texture coordinates by a texture system DDAportion 1211, address information for texture access, etc. and outputsinformation relating to texture to the second operation processingelement 123-2 via the first operation processing element 122-2 and thecrossbar circuit 13.

[0107] On the other hand, the DDA circuit 121-2 performs color and otherprocessing other than texture by another DDA portion 1212 and outputsthe result to the first operation processing element 122-2.

[0108] The first operation processing element 122-2 performs processingat the pixel level (per-pixel operation) based on data other than thetexture information supplied from the DDA circuit 121-2 and data aftertexture filtering by the second operation processing elements 123-0 to123-3 of the processing modules 12-0 to 12-3 received through thecrossbar circuit 13 and outputs the results to the second operationprocessing element 123-2 via the crossbar circuit 13.

[0109] In the processing at the pixel level, the texture data afterfiltering and the various types of data after rasterization are used foroperations on pixel units. The processing performed here corresponds topixel-level lighting or other so-called pixel shader processing.

[0110] The second operation processing element 123-2 receives coordinatedata and address data relating to texture supplied from the DDA circuit121-2, reads out the texture data from the memory module 124-2, performstexture filtering, and outputs the texture data after the filtering viathe crossbar circuit 13 to either of the first operation processingelements 122-0 to 122-3 in processing modules having a frame buffercorresponding to a stamp.

[0111] In this case, the second operation processing element 123-2performs four-neighbor interpolation or other filtering by using theread texture data and a decimal fraction obtained at the time ofcalculation of the (u, v) address.

[0112] Moreover, the second operation processing element 123-2 receivesprocessing results of the pixel level supplied from the first operationprocessing element 122-2 and draws the pixel data passing various testsin the processing at the pixel level to the memory module 124-2.

[0113] The DDA circuit 121-3 in the processing module 12-3 calculatesDDA parameters, specifically inclination of various types of data (Z,texture coordinate, colors, etc.) or other DDA parameters necessary forrasterization based on information from the setup circuit 11.

[0114] Also, the DDA circuit 121-3 judges, based on the parameter data,whether a triangle for example is its assigned area and, when theassigned area, performs rasterization.

[0115] Specifically, it judges whether the triangle is its assignedarea, for example, whether it belongs in an area interleaved in 4×4pixel rectangle area units and, when belonging, rasterizes various typesof data (Z, texture coordinates, colors, etc.). In this case, thegenerated unit is 2×2 pixels per cycle per local module.

[0116] Next, the DDA circuit 121-3 corrects the perspective of thetexture coordinates. This processing stage includes calculation of theMipMap level by calculation of the LOD (level of detail) and calculationof the (u, v) address for texture access.

[0117] The DDA circuit 121-3, as shown in FIG. 6 for example, performstexture processing on the texture coordinates by a texture system DDAportion 1211, address information for texture access, etc. and outputsinformation relating to texture to the second operation processingelement 123-3 via the first operation processing element 122-3 and thecrossbar circuit 13.

[0118] On the other hand, the DDA circuit 121-3 performs color and otherprocessing other than texture by another DDA portion 1212 and outputsthe result to the first operation processing element 122-3.

[0119] The first operation processing element 122-3 performs processingat the pixel level (per-pixel operation) based on data other than thetexture information supplied from the DDA circuit 121-3 and data aftertexture filtering by the second operation processing elements 123-0 to123-3 of the processing modules 12-0 to 12-3 received through thecrossbar circuit 13 and outputs the results to the second operationprocessing element 123-3 via the crossbar circuit 13.

[0120] In the processing at the pixel level, the texture data afterfiltering and the various types of data after rasterization are used foroperations on pixel units. The processing performed here corresponds topixel-level lighting or other so-called pixel shader processing.

[0121] The second operation processing element 123-3 receives coordinatedata and address data relating to texture supplied from the DDA circuit121-3, reads out the texture data from the memory module 124-3, performstexture filtering, and outputs the texture data after the filtering viathe crossbar circuit 13 to either of the first operation processingelements 122-0 to 122-3 in processing modules having a frame buffercorresponding to a stamp.

[0122] In this case, the second operation processing element 123-3performs four-neighbor interpolation or other filtering by using theread texture data and a decimal fraction obtained at the time ofcalculation of the (u, v) address.

[0123] Moreover, the second operation processing element 123-3 receivesprocessing results of the pixel level supplied from the first operationprocessing element 122-3 and draws the pixel data passing various testsin the processing at the pixel level to the memory module 124-3.

[0124]FIG. 7 is a view of a concrete example of the configuration of aglobal bus system in a crossbar circuit according to the presentembodiment.

[0125] The crossbar circuit 13, as shown in FIG. 7, has four groups of afirst to a fourth interconnect group GRP0 to GRP3, four texture linesforming one group.

[0126] A first interconnect group GRP0 has four interconnects tex00 totex03, a second interconnect group GRP1 has four interconnects tex10 totex13, a third interconnect group GRP1 has four interconnects tex20 totex23, and a fourth interconnect group GRP1 has four interconnects tex30to tex33.

[0127] Further, a terminal of the second operation processing element123-0 in the processing module 12-0 is connected to the interconnecttex00 in the first interconnect group GRP0, the interconnect tex10 inthe second interconnect group GRP1, the interconnect tex20 in the thirdinterconnect group GRP2, and the interconnect tex30 in the fourthinterconnect group GRP3.

[0128] In the same way, a terminal of the second operation processingelement 123-1 in the processing module 12-1 is connected to theinterconnect tex01 in the first interconnect group GRP0, theinterconnect tex11 in the second interconnect group GRP1, theinterconnect tex21 in the third interconnect group GRP2, and theinterconnect tex31 in the fourth interconnect group GRP3.

[0129] A terminal of the second operation processing element 123-2 inthe processing module 12-2 is connected to the interconnect tex02 in thefirst interconnect group GRP0, the interconnect tex12 in the secondinterconnect group GRP1, the interconnect tex22 in the thirdinterconnect group GRP2, and the interconnect tex32 in the fourthinterconnect group GRP3.

[0130] A terminal of the second operation processing element 123-3 inthe processing module 12-3 is connected to the interconnect tex03 in thefirst interconnect group GRP0, the interconnect tex13 in the secondinterconnect group GRP1, the interconnect tex23 in the thirdinterconnect group GRP2, and the interconnect tex33 in the fourthinterconnect group GRP3.

[0131] The four interconnects tex00-tex03 in the first interconnectgroup GRP0 are connected to the terminal of the first operationprocessing element 122-0 in the processing module 12-0.

[0132] In a same way, the four interconnects tex10-tex13 in the secondinterconnect group GRP1 are connected to the terminal of the firstoperation processing element 122-1 in the processing module 12-1.

[0133] The four interconnects tex20-tex23 in the third interconnectgroup GRP2 are connected to the terminal of the first operationprocessing element 122-2 in the processing module 12-2.

[0134] The four interconnects tex30-tex33 in the fourth interconnectgroup GRP3 are connected to the terminal of the first operationprocessing element 122-3 in the processing module 12-3.

[0135] Processing in an image processing apparatus 10 having such aconfiguration is performed as shown conceptually in FIG. 8.

[0136] That is, the data from the setup circuit 11 is distributed in themodules to the texture DDA portion 1211 and the other DDA portion 1212.Texture-filtering is performed by the second operation processingelement 123 based on texture information from the texture DDA portion1211 and texture data in the memory 124 (flow of data indicated by (1)).

[0137] The texture data after the filtering is distributed by thecrossbar circuit 13 to the first operation processing element 122 in arequested module (flow of data indicated by (2)).

[0138] Then, processing at the pixel level is performed in the firstoperation processing element 122 and the result is sent to the secondoperation processing element 123 via the crossbar circuit 13, thus thedata is drawn in the memory module 124 (flow of data indicated by (3)).

[0139] Next, the operation by the configuration of the above FIG. 4 willbe described with reference to FIG. 5.

[0140] First, the setup circuit 11 performs an operation on the vertexdata, sets up a primitive, and outputs the assigned texture's worth ofsetup information to the processing modules 12-0 to 12-3.

[0141] The DDA circuits 121-0 to 121-3 in the processing modules 12-0 to12-3 calculate DDA parameters, specifically inclination of various typesof data (Z, texture coordinates, colors, etc.) or other DDA parametersnecessary for rasterization based on information from the setup circuit11.

[0142] The DDA circuits 121-0 to 121-3 judge, based on the parameterdata, whether for example a triangle for is its assigned area and, whenits assigned area, performs rasterization.

[0143] Moreover, the DDA circuits 121-0 to 121-3 calculate the mipmaplevel by calculating the LOD and calculates the (u, v) addresscalculation for texture access.

[0144] Then, the DDA circuits 121-0 to 121-3 output texture coordinatesobtained by the texture system DDA portion 1211 and address informationfor texture access etc. to the second operation processing elements123-0 to 123-3 via the first operation processing elements 122-0 to122-3 and the crossbar circuit 13.

[0145] On the other hand, the DDA circuits 121-0 to 121-3 supply colorand other information other than texture obtained by the other DDAportion 1212 to the first operation processing elements 122-0 to 122-3.

[0146] The second operation processing elements 123-0 to 123-3 in theprocessing modules 12-0 to 12-3 receive coordinate data and address datarelating to texture supplied from the DDA circuits 121-0 to 121-3, readsout texture data from the memory modules 124-0 to 124-3, then performsfour-neighbor interpolation or other texture filtering using the readtexture data and a decimal fraction obtained by calculation of the (u,v) address.

[0147] The texture data after the filtering from the second operationprocessing elements 123-0 to 123-3 are supplied via the crossbar circuit13 to the first operation processing element 122-1 in, for example, theprocessing module 12-1 having a frame buffer corresponding to a stamp.

[0148] The first operation processing element 122-1 in the processingmodule 12-1 performs processing at the pixel level based on data otherthan texture information supplied from the DDA circuit 121-1 and dataafter texture filtering by the second operation processing elements123-0 to 123-3 of the processing modules 12-0 to 12-3 received via thecrossbar circuit 13 and outputs the results to the second operationprocessing element 123-1.

[0149] Then, the second operation processing element 123-1 receives theprocessing results at the pixel level supplied by the first operationprocessing element 122-1 and draws the pixel data passing various typesof tests in the processing at the pixel level in the memory module124-1.

[0150] The above processing is performed in parallel at the modules.

[0151] As described above, the present embodiment, as shown in FIG. 9,performs DDA processing (ST11), then reads out texture data from amemory (ST12), performs sub-word reallocation processing (ST13), thenperforms texture filtering (ST14), then globally distributes data by thecrossbar circuit 13 to a first operation processing element of eachprocessing module (ST15), then performs processing at the pixel level,specifically uses the texture data after filtering and the various typesof data after rasterization to perform operations by pixel units anddraws the pixel data passing the various types of tests in theprocessing at the pixel level to a frame buffer on a memory module(ST16), so can exhibit the following effects.

[0152] That is, since the data is distributed after being decreased byfiltering, the crossbar circuit 13 serving as a global bus can be madesmall in size.

[0153] Further, since the flow of data before the filtering can belocalized, the path from a memory module requiring a wide bandwidth to asecond operation processing element is localized and thus a higher speedof processing can be achieved.

[0154] As a result, there is the advantage that it is possible torealize an image processing apparatus which is easy to design and whichcan reduce the interconnect cost and interconnect delay.

INDUSTRIAL APPLICABILITY

[0155] As described above, an image processing apparatus and methodthereof according to the present invention can reduce the number ofinterconnects of a crossbar circuit and make the circuit small in sizewhen a plurality of processing apparatuses share processing data toperform parallel processing. As a result, the design is easy and theinterconnect cost and interconnect delay can be reduced, so theinvention can be applied to a graphics LSI etc.

1. An image processing apparatus in which a plurality of processingmodules share processing data to perform parallel processing, wherein:the plurality of processing modules each comprise: a memory modulestoring at least data relating to filtering, a processing circuit forobtaining data for filtering and performing assigned processingdetermined in advance by corresponding memory interleaving based on theprocessing data, a first operation processing element for performingoperation processing in pixel units based on assigned processing dataand data after filtering obtained at said processing circuit, and asecond operation processing element for performing filtering based onthe data for filtering obtained by said processing circuit and the datarelating to filtering stored in said memory module and receivingoperation processing data from said first operation processing element,then drawing the operation processed data to the memory module andfurther comprises a crossbar circuit which is a global bus forconnecting a plurality of first operation processing elements and aplurality of second operation processing elements of said processingmodules, supplying data for filtering obtained by said processingcircuit in each processing module to a second operation processingelement in the same processing module, supplying data after filteringfrom a second operation processing element in each processing module toa first operation processing element in a processing modulecorresponding to the processing, and supplying the operation processingdata from the first operation processing element to the second operationprocessing element.
 2. An image processing apparatus as set forth inclaim 1, wherein the processing circuit of each processing modulecomprises a means for adjusting time so that the processing time of theassigned data becomes equal to time when the data after filtering issupplied to the first operation processing element.
 3. An imageprocessing apparatus as set forth in claim 1, further comprising a setupcircuit for performing an operation on the vertex data of a primitive,setting up one primitive, and outputting assigned data to the processingcircuits of the processing modules.
 4. An image processing apparatus asset forth in claim 1, wherein the processing requiring the filtering isprocessing relating to texture.
 5. An image processing apparatus as setforth in claim 1, wherein said parallel processing is parallelprocessing at the pixel level.
 6. An image processing method in which aplurality of processing modules share processing data to performparallel processing; comprising steps of: obtaining data for filteringand performing assigned process determined in advance by correspondingmemory interleaving based on the processing data in each processingmodule; performing filtering based on the obtained data for filteringand data relating to filtering stored in a memory module; supplying dataafter filtering in each processing module, through a global bus, to apredetermined processing module; and performing operation processing inpixel unit based on the obtained assigned processing data and the dataafter filtering, and drawing the operation processed data to said memorymodule, in a processing module receiving the data after filtering.
 7. Animage processing method as set forth in claim 6, further comprising astep of adjusting each processing module time so that processing time ofthe assigned data becomes equal to time when the data after filtering issupplied.
 8. An image processing method as set forth in claim 6, whereinthe processing requiring the filtering is processing relating totexture.
 9. An image processing method as set forth in claim 6, whereinthe parallel processing is parallel processing at the pixel level.